Trading accuracy for faster entity linking
نویسندگان
چکیده
Named entity linking (NEL) can be applied to documents such as financial reports, web pages and news articles, but state of the art disambiguation techniques are currently too slow for web-scale applications because of a high complexity with respect to the number of candidates. In this paper, we accelerate NEL by taking two successful disambiguation features (popularity and context comparability) and use them to reduce the number of candidates before further disambiguation takes place. Popularity is measured by in-link score, and context similarity is measured by locality sensitive hashing. We present a novel approach to locality sensitive hashing which embeds the projection matrix into a smaller array and extracts columns of the projection matrix using feature hashing, resulting in a lowmemory approximation. We run the linker on a test set in 63% of the baseline time with an accuracy loss of 0.72%.
منابع مشابه
The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution
This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...
متن کاملEstimating the Parameters for Linking Unstandardized References with the Matrix Comparator
This paper discusses recent research on methods for estimating configuration parameters for the Matrix Comparator used for linking unstandardized or heterogeneously standardized references. The matrix comparator computes the aggregate similarity between the tokens (words) in a pair of references. The two most critical parameters for the matrix comparator for obtaining the best linking results a...
متن کاملComputerized Linking of Capital Markets - A Viable Approach
Interlinking capital markets has always been an interesting issue since it not only provides more investment opportunities but also results in reduction of the risk of market volatility due to increase in the size of market. However, global and local barriers like different currencies, legal issues, settlement risks and costs prevent such interlink age to take place efficiently. In this paper, ...
متن کاملUBC Entity Linking at TAC-KBP 2013: random forests for high accuracy
This paper describe our systems and different runs submitted for the Entity Linking task at TAC-KBP 2013. We developed two systems, one is a generative entity linking model and the other is a supervised system reusing the scores of the previous model using random forests. Our main research interest is Named Entity Disambiguation task and we thus performed a very naive clustering of NIL instance...
متن کاملFaster (and Better) Entity Linking with Cascades
Entity linking requires ranking thousands of candidates for each query, a time consuming process and a challenge for large scale linking. Many systems rely on prediction cascades to efficiently rank candidates. However, the design of these cascades often requires manual decisions about pruning and feature use, limiting the effectiveness of cascades. We present Slinky, a modular, flexible, fast ...
متن کامل